Annotation Guidelines for Chinese-Korean Word Alignment
نویسندگان
چکیده
For a language pair such as Chinese and Korean that belong to entirely different language families in terms of typology and genealogy, finding the correspondences is quite obscure in word alignment. We present annotation guidelines for Chinese-Korean word alignment through contrastive analysis of morpho-syntactic encodings. We discuss the differences in verbal systems that cause most of linking obscurities in annotation process. Systematic comparison of verbal systems is conducted by analyzing morpho-syntactic encodings. The viewpoint of grammatical category allows us to define consistent and systematic instructions for linguistically distant languages such as Chinese and Korean. The scope of our guidelines is limited to the alignment between Chinese and Korean, but the instruction methods exemplified in this paper are also applicable in developing systematic and comprehensible alignment guidelines for other languages having such different linguistic phenomena.
منابع مشابه
Automatic word alignment tools to scale production of manually aligned parallel texts
We have been creating large-scale manual word alignment corpora for Arabic-English and Chinese-English language pairs in genres such as newsire, broadcast news and conversation, and web blogs. We are now meeting the challenge of word aligning further varieties of web data for Chinese and Arabic “dialects”. Human word alignment annotation can be costly and arduous. Alignment guidelines may be im...
متن کاملWord Alignment Annotation in a Japanese-Chinese Parallel Corpus
Parallel corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual annotation of word alignment is of significance to provide gold-standard for developing and evaluating both example-based machine translation model and statistical machine translation model. This paper presents the work...
متن کاملDistant annotation of Chinese tense and modality
In this paper we describe a “distant annotation” method by which we mark up tense and modality of Chinese eventualities via a wordaligned parallel corpus. We first map Chinese verbs to their English counterpart via word alignment, and then annotate the resulting English text spans with coarse-grained tense and modality categories that we believe apply to both English and Chinese. Because Englis...
متن کاملAutomatic Adaptation of Annotations
Manually annotated corpora are indispensable resources, yet for many annotation tasks, such as the creation of treebanks, there exist multiple corpora with different and incompatible annotation guidelines. This leads to an inefficient use of human expertise, but it could be remedied by integrating knowledge across corpora with different annotation guidelines. In this article we describe the pro...
متن کاملBuy one get one free: Distant annotation of Chinese tense, event type and modality
We describe a “distant annotation” method where we mark up the semantic tense, event type, and modality of Chinese events via a word-aligned parallel corpus. We first map Chinese verbs to their English counterparts via word alignment, and then annotate the resulting English text spans with coarse-grained categories for semantic tense, event type, and modality that we believe apply to both Engli...
متن کامل